Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regex tokenizing bug #131

Merged
merged 1 commit into from
Sep 24, 2019
Merged

Conversation

CaseyCarter
Copy link
Member

Description

We skip a non-match character in regex_iterator::operator++ after a zero-length match to avoid repeat matches, resulting in incorrect behavior when tokenizing with a regex to match the delimiters between tokens.

Fixes DevCom#733051.

[This is a replay of internal PR 203601].

Checklist:

  • I understand README.md.
  • If this is a feature addition, that feature has been voted into the C++
    Working Draft. (N/A)
  • Any code files edited have been processed by clang-format 8.0.1.
    (The version is important because clang-format's behavior sometimes changes.)
  • Identifiers in any product code changes are properly _Ugly as per
    https://eel.is/c++draft/lex.name#3.1 .
  • Identifiers in test code changes are not _Ugly.
  • Test code includes the correct headers as per the Standard, not just
    what happens to compile.
  • The STL builds and test harnesses have passed (must be manually verified
    by an STL maintainer before CI is online, leave this unchecked for initial
    submission).
  • This change introduces no known ABI breaks (adding members, renaming
    members, adding virtual functions, changing whether a type is an aggregate or
    trivially copyable, etc.). If unsure, leave this box unchecked and ask a
    maintainer for help.

We skip a non-match character in `regex_iterator::operator++` after a zero-length match to avoid repeat matches, resulting in incorrect behavior when tokenizing with a regex to match the delimiters between tokens.

Fixes [DevCom#733051](https://developercommunity.visualstudio.com/content/problem/733051/splitting-a-string-with-a-regex-returns-seemingly.html).
@CaseyCarter CaseyCarter requested a review from a team as a code owner September 24, 2019 18:58
@@ -26,9 +26,6 @@ _STL_DISABLE_CLANG_WARNINGS
#pragma push_macro("new")
#undef new

// TRANSITION, Several names herein are declared as _Xxx_xxx2; in all such cases the corresponding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For folks reading, these comments were transformed into tests that aren't yet present here in GitHub (I'm working on it right now, sorry!).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the folks wondering why it matters: ABI.

When we change the relationship between an internal function _Fn and public function fn, we have to change the name of _Fn to avoid "old" fn in already-compiled code "in the wild" linking to freshly-compiled code and consequently calling the "new" _Fn with catastrophic results. Ditto "new" fn must not call "old" _Fn. When the change touches the function parameters, generally the mangled name of _Fn is different. If the change doesn't alter the function parameters, we'll change the name of _Fn directly, say by adding a 2.

When the name is changed directly the old name _Fn becomes available. We could then inadvertently reuse it for something else, with potentially the same catastrophic results. Rather than simply keeping a list of names to avoid - which people would have to check when writing new code - we have a test that crashes if such a name is reused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants